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Recently several more efficient versions of quantum state tomography have been proposed, with the 
purpose of making tomography feasible even for many-qubit states. The number of state parameters 
to be estimated is reduced by tentatively introducing certain simplifying assumptions on the form 
of the quantum state, and subsequently using the data to rigorously verify these assumptions. The 
simplifying assumptions considered so far were (i) the state can be well approximated to be of low 
rank, or (ii) the state can be well approximated as a matrix product state. We add one more 
method in that same spirit: we allow in principle any model for the state, using any (small) number 
of parameters (which can, e.g., be chosen to have a clear physical meaning), and the data are used 
to verify the model. The proof that this method is valid cannot be as strict as in above-mentioned 
cases, but is based on well-established statistical methods that go under the name of "information 
criteria." We exploit here, in particular, the Akaike Information Criterion (AIC). We illustrate the 
method by simulating experiments on (noisy) Dicke states. 



I. INTRODUCTION 

Quantum state estimation [1-3J remains one of the 
hot topics in the field of quantum information process- 
ing. The hope to recover each element in the density 
matrix, however, is impeded by the exponential growth 
of the number of matrix elements with the number of 
qubits, and the concomitant exponential growth in time 
and memory required to compute and store the den- 
sity matrix. The task can become intimidating when 14 
qubits are involved jl] , and so efforts have been made to 
simplify quantum state tomography. One such effort fo- 
cused on states that have high purity [5] so that the size 
of the state space shrinks significantly (from 0(0"^) to 
0{D) for a system described by a Z? dimensional Hilbert 
space). Given that the measurement record is used to 
verify the assumptions made initially, this method avoids 
the trap of simplification through imposing a priori as- 
sumptions merely by fiat. Another recent effort [6^ in 
the same spirit considered multi-qubit states that are 
well represented by matrix product states (which 
require a number of parameters growing only polynomi- 
ally with the number of qubits). Many states of interest, 
such as ground states of certain model Hamiltonians in 
condensed-matter physics, are of that form. Crucially, 
the particular form of the state can be verified by the 
data. 

Here we go one step further, and we will allow, tenta- 
tively, any parametrized form for the density matrix of 
the quantum system to be tested, possibly containing just 
a few parameters. In fact, we may have several different 
tentative ideas of how our quantum state is best param- 
eterized. The questions are then, how the data reveal 
which of those descriptions work sufficiently well, and 
which description is the best. This idea corresponds to a 
well-developed field in statistics: model selection JJl 12J. 
All mathematical descriptions of reality are in fact mod- 
els (and a quantum state, pure or mixed, is an excellent 



example of a model), and they can be evaluated by judg- 
ing their performance relative to that of the true model 
(assuming it exists). In order to quantify this relative 
performance, we will make use of the Kullback-Leibler di- 
vergence (aka mutual information, aka cross entropy, aka 
relative entropy) [13 , which has the interpretation of the 
amount of information lost when a specific model is used 
rather than the true model. Based on the minimization 
of the Kullback-Leibler divergence over different models, 
the Akaike Information Criterion (AIC) 14] was devel- 
oped as a ranking system so that models are evaluated 
with respect to each other, given measurement data. The 
only quantities appearing in the criterion are the maxi- 
mum likelihood obtainable with a given model (i.e., the 
probability the observed data would occur according to 
the model, maximized over all model parameters), and 
the number of independent parameters of the model. 

The minimization does not require any knowledge of 
the true model, only that the testing model is sufficiently 
close to the true model. The legitimate application of 
AIC should, therefore, in principle be limited to "good" 
models, ones that include the true model (in our case, the 
exact quantum state that generated the data), at least 
to a very good approximation. However this does not 
prevent one from resorting to the AIC for model evalua- 
tion when there is no such guarantee. In fact, Takeuchi 
studied the case where the true model does not belong 
to the model set and came up with a more general cri- 
terion, named the Takeuchi Information Criterion, TIC 
|15j . However the estimation of the term introduced by 
Takeuchi to counterbalance the bias of the maximum like- 
lihood estimator used in the AIC, requires estimation of 
a K X K matrix {K being the number of independent 
parameters used by a model) from the data, which, un- 
fortunately, is prone to significant error. This reduces 
the overall charm and practical use of the TIC. Since in 
most cases the AIC is still a good approximation to the 
TIC [TT] , especially in the case of many data, we stick to 
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the simpler and more robust criterion here. 

Information criteria are designed to produce a rela- 
tive (rather than absolute) ranking of models, so that 
fixing a reference model is convenient. Throughout this 
paper we choose the "full-parameter model" (FPM) as 
reference, that is, a model with just enough independent 
variables to fully parameterize the measurement on our 
quantum system. For tomographically complete mea- 
surements (discussed in detail in Sec. IIIB) the num- 
ber of independent variables is given by the number of 
free parameters in the density matrix (2^-° — 1 for a D- 
dimensional Hilbert space). For tomographically incom- 
plete measurements (see Sec. HID), the number of in- 
dependent variables of FPM is smaller, and equals the 
number of independent observables. We will, in fact, not 
even need the explicit form of the FPM (which may be 
hard to construct for tomographically incomplete mea- 
surements), as its maximum possible likelihood can be 
easily upper-bounded. 

We should note an important distinction between max- 
imum likelihood estimation (MLE) fTO , a technique often 
used in quantum tomography, and the method of infor- 
mation criteria and model selection. MLE produces the 
state that fits the data best. Now the data inevitably 
contains (statistical) noise, and the MLE state predicts, 
incorrectly, that same noise to appear in future data. 
Information criteria, on the other hand, have been de- 
signed to find the model that best predicts future data, 
and tries, in particular, to avoid overfitting to the data, 
by limiting the number of model parameters. This is how 
a model with a few parameters can turn out to be the 
best predictive model, even if, obviously, the MLE state 
will fit the (past) data better. 

We also note that information criteria have been ap- 
plied mostly in areas of research outside of physics. This 
is simply due to the happy circumstance that in physics 
we tend to know what the "true" model underlying our 
observations is (or should be), whereas this is much less 
the case in other fields. Within physics, information cri- 
teria have been applied to astrophysics [17], where one 
indeed may not know the "true" model (yet), but also 
to the problem of entanglement estimation [T5]. In the 
latter case (and in quantum information theory in gen- 
eral) the problem is not that we do not know what the 
underlying model is, but that that model may contain 
far too many parameters. Hence the potential usefulness 
of information criteria. And as we recently discovered, 
the AIC has even been applied to quantum state estima- 
tion, not for the purpose of making it more efficient, but 
making it more accurate, by avoiding overfitting [T^ . 



II. THE AKAIKE INFORMATION CRITERION 
- A SCHEMATIC DERIVATION 

Suppose we are interested in measuring certain vari- 
ables, summarized as a vector x, and their probability of 
occurrence as outcome of our measurement. We denote 



/(x) as the probabilistic model that truthfully refiects 
reality (assuming for convenience that such a model ex- 
ists) and 5(x|6') as our (approximate) model character- 
ized by one or more parameters, summarized as a vec- 
tor 9. The models satisfy the normalization condition 
/ dx/(x) = J dxg{x\9) — 1 for all 9. By definition, we 
say there is no information lost when /(x) is used to 
describe reality. The amount of information lost when 
^(xl^) is used instead of the true model is defined to be 
the Kullback-Leibler divergence [13] between the model 
^(xl^) and the true model /(x): 



Iif,9e)^ J dx/(x)log(/(x)) 

- /dx/(x)log(.9(x|0l). (1) 



Eq. ^ can be conveniently rewritten as 

/(/, gg) = [log(/(x))] - [log(g(x|0l)] , (2) 

where Ey\-] denotes an estimate with respect to the true 
distribution /(x). We see that x is no longer a variable 
in the above estimator, as we integrated it out. The only 
variable that affects I{f,gg) is 9. Since the first term 
m Eq. ^ is irrelevant to the purpose of rank-ordering 
different models g (not to mention we cannot evaluate 
it when / is not known), we only have to consider the 
second term. Suppose there exists 9^ such that 5(x|0o) — 
/(x) for every x, that is, the true model is included in 
the model set. Note that for this to hold, 9 does not 
necessarily contain the same number of parameters as 
the dimension of the system. To use a simpler notation 
without the integration over x we denote the second term 
in Eq. ([ij (without the minus sign) as 



S{e^:9)= / dx.g(x|0o)log(g(x|0)). 



(3) 



where we have used (7(x|0o) to represent the true model 
/(x). The advantage of this estimator is that it can 
be approximated without knowing the true distribution 
/(x). To do that we first consider the situation where 9 is 
close to ^0- This assumption can be justified in the limit 
of large iV, N being the number of measurement records, 
since the model 9 ought to approach asymptotically 
(assuming, for simplicity, 9q is unique). We know that 
S{9o : 9) must have a maximum when 9 = 9o, and we 
may then symbolically expand S{9q : 9) in the vicinity of 
Oo by 



Si9o : 9) = S{9o : 9o) - -\\ 



+ o(\ 



(4) 
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where 



(5) 



denoting a squared length derived from a metric defined 
at 9q. It can be proved that when N is sufficiently large 
||0 — 0o||| can be approximated by the Xk distribution, 
with K equal to the number of independent parameters 
used by the model 9. From the properties of the Xk 
distribution, we know the average value of \\9 — ^olll 
will approach K. 

The next step is to evaluate the estimator S{9q : 9q), 
where 9o is now considered a variable. Suppose we find 
the maximum likelihood estimate 9m from the measure- 
ment outcomes such that S{9m '■ 9m) is the maximum. 
Now 9m should also be close to the true model 9q, when 
N is sufficiently large. Therefore we can similarly expand 
S{9q : 9o) in the vicinity of 9m as 



S{9f) : 9o) — S{9m ■ 9m) — ;:||^o — 9_ 



^Ji/lll. 



(6) 



is a length similarly defined as in Eq. Q and has 
the same statistical attributes as ||6' — ''-"^ 
related to 9m the same way 9 is related to 6*0 and 



^ smce is 
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very close to ^o- Its average value, therefore, approaches 
again K, according to the xi: distribution. Thus we are 
able to rewrite Eq. ([s]) as 



S{9o : 9) « S{9m ■ 9m) ~ K. 



(7) 



We see that now our target estimator S{9o : 9) is eval- 
uated by the MLE solution 9m only (plus the number 
of parameters K of the model), with no knowledge of 
what the true model / is. The assumption that underlies 
this convenience is constituted by two parts: estimating 
S{9i) : 9) with its maximum f?o and estimating 5(00 '■ 9q) 
from the data by its optimum 9m- The deviations from 
their respective maxima are equal and result simply in 
the appearance of the constant K. 

We now denote Lm = S{9m ■ 9m), which is the max- 
imum likelihood obtainable by our model, with respect 
to a given set of measurement records. The AIC is then 
defined by 



AIC = -2L 



M 



2K. 



(8) 



Apart from the conventional factor 2, and a constant in- 
dependent of the model 9, AIC is an estimator of the 
quantity in Eq. ([T]) we originally considered, that is, the 
KuUback-Leibler divergence between a model that is used 
to describe the true model and the true model itself. 



Therefore a given model is considered better than an- 
other if it has a lower value of AIC. 

Finally, in the case that N is not so large yet that 
asymptotic relations hold to a very good approximation, 
one can include a correction factor to the AIC taking 
the deviation from asymptotic values into account. The 
corrected AIC gives rise to a slightly different criterion 



AICc 



-2L 



M ■ 



2K 



2K{K + 1) 
N -K-l 



(9) 



III. RESULTS 



Dicke states 



Wc will apply the AIC to measurements on a popular 
family of entangled states, the Dicke states of four qubits 
PTVl25j . We simulate two different experiments, one to- 
mographically complete experiment, another measuring 
an entanglement witness. We include imperfections of 
a simple type, and we investigate how model selection, 
according to the AIC, would work. We consider cases 
where we happen to guess the correct model, as well as 
cases where our initial guess is, in fact, incorrect. 

We consider the four-qubit Dicke states with one or 

-.1.2^ 



two excitations 
an excitation): 



) (with the state |1) representing 



Dl) = (10001) -t- 10010) + 10100) + |1000)) /2, (10a) 
(10011) + 10101) + 10110) + |1001) 
+ |1010) + |1100)) V6. (10b) 



\Dl 



For simplicity, let us suppose that white noise is the only 
random noise in the state generation, and that it corre- 
sponds to mixing of the ideal state with the maximally 
mixed state of the entire space (instead of the subspace 
with exactly one or two excitations, which could be a 
reasonable choice, too, depending on the actual imple- 
mentation of the Dicke states) . We thus write the states 
under discussion as 



p^'\a)^{l~a) 



D 



D 



al/D, 



(11) 



where l/D is the maximally mixed state for dimension 
D — 2'^, and < a < 1. We will fix the actual state 
generating our data to be 



1,2 
^actual 



pi'2(a = 0.2). 



(12) 



This choice is such that the mixed state is entangled (as 
measured by our multi-qubit version of the negativity, 
see below), even though the entanglement witness whose 
measurement we consider later in Sec. |IIID[ just fails to 
detect it. 
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For our first model (to be tested by AIC) we wish to 
pick a one-parameter model (so, K = I) that also in- 
cludes a wrong guess. A straightforward model choice, 
denoted by Mi^, is 



M- 



14, 



■■p'/{Q) = {^-q) 



1:2 

target V 



t(0) 



(0) +ql/D. 



(13) 



We refer to the pure states appearing here as the target 



states 



'target simulating the case where we (possi- 
bly incorrectly) think we would be creating a pure state 
of that form, if only the white noise were absent {q — 0). 
The phase (j) is included not as a (variable) parameter of 
the model but as an inadvertently mis-specified property. 
In this case, it stands for us being wrong about a single 
relative phase in one of the qubits in state |1). With- 
out loss of generality we assume the first qubit in our 
representation to carry the wrong phase, and we write 



|*t'arget(<^)) =^ H^^^^) + l^^^^) + l^^^^) 



+ e*-^ 11000)) , 
1 



|*?argetW)=^n0011 



10100) + 10110) 
e'-^dlOOl) + 11010) + |1100))] 



(14a) 



(14b) 



Alternatively, if we do consider this a two-parameter 
model (changing K = 1 to = 2), then is variable, 
and we would optimize over cj). In our case, this optimum 
value should always be close to = 0. 



B. Tomographically complete measurement 

We first consider a tomographically complete measure- 
ment, in which a so-called SIC-POVM (symmetric in- 
formationally complete positive operator values measure 
|26j ) with 4 outcomes is applied to each qubit individu- 
ally. We first test our one-parameter model, and compare 
it to the FPM, which contains 255 {— 4"^ — 1) parameters, 
which is the number of parameters needed to fully de- 
scribe a general state of 4 qubits. With definition Eq. ([s]) 
we have 

AIC(Mi^) = -2Lm(A/i^) + 2, (15) 
since X = 1 for Mi^. For the FPM we have 

AIC(FPM) = -2LAf(FPM) 4- 2 x 255, 



(16) 



where Lm(FPM) is the log of the maximum likelihood 
obtainable by the FPM. The latter can be bounded from 
above by noting that the best possible FPM would gener- 
ate probabilities that exactly match the actual observed 
frequencies of all measurement outcomes. In the follow- 
ing we will always use that upper bound, rather than the 
actual maximum likelihood. Even though it is possible to 
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FIG. 1: How AIC ranks the one- and two-parameter models 
vs. the full-parameter model (FPM): 

Plot of the difference between AIC values of our models 
and the FPM, i.e., -AAIC = AIC(FPM) - AIC(Mi^) or 
-AAIC = AIC(FPM) - AIC(M20), for various numbers of 
SIC-POVM measurements, A'^, with j'l'Jargct) b.s the target 
state, as functions of the angle (p. The horizontal line demar- 
cates AAIC = 0: points above (below) that line correspond 
to cases where the model with fewer (more) parameters is 
preferred. The figures with j*? target) as the target state look 
very similar (see FIG. [2] for an example of this similarity). 



find the maximum likelihood state in principle (and even 
in practice for small enough Hilbert spaces), we are only 
concerned with the FPM's ranking according to the AIC, 
which does not require its density matrix representation. 
For MiA, to beat the FPM we require 



-AAIC AIC(FPM) ~ AlC{Mi^) > 0. 



(17) 



This is a sufficient but not necessary requirement, as we 
use the above-mentioned upper bound to the FPM like- 
lihood. 

We plot the difference AAIC between the two rankings 
m FIG.jlJa) for various values of the number of measure- 
ments, and for various values of the phase 4>. We observe 
the following: The simple model is, correctly, judged bet- 
ter than the FPM when the phase is sufficiently small. 
The more measurements one performs, the smaller </> has 
to be for AIC to still declare the model superior to the 
FPM (i.e., for the points to stay above the solid line, at 
AAIC = 0). 

Although the correction to the AIC mentioned in 



5 



500 





-500 
-1000 
-1500 
-2000 
-2500 



O 
X 



o 

X 



o 

X 



4 



o 

X 



FIG. 2: Comparing single- and double- excitation Dicke states: 
The difference between AICs of Mi^ and the FPM, i.e., 
-AAIC = AIC(FPM) - AIC(Afi0) for both target states, 
when A'^ = 10000, as functions of 4>- The horizontal line de- 
marcates AAIC = 0. 



Eq. ([9| is not very small for the FPM for N = 1000, ap- 
plying that correction still does not shift the second and 
third point below zero: that is, N — 1000 measurements 
is still not sufficiently large for the AICc to recognize that 
(/) = 7r/4 and <f> = t: /2 are incorrect guesses. One can ar- 
gue about what the cause of this is: it could be that N is 
just too small for the derivation of the AIC (or even the 
AICc) to be correct. Or it could be that the AIC ranking 
is unreliable because the assumption that the true model 
is included in the model, is violated. Or it could be that, 
even with a perfectly valid criterion (perhaps the TIC), 
the statistical noise present in the data would still be too 
large. 

If we consider the phase as a second (variable) pa- 
rameter (thus creating a two-parameter model) , then we 
can give FIG. 1 a different interpretation: we would pick 
(/) = as the best choice, and we would increase K hy 1. 
The latter correction is small on the scale of the plots, 
and so we find the two-parameter model to be superior to 
the one-parameter model for any nonzero plotted value 
of 0, and to the FPM. This is a good illustration of the 
following rather obvious fact: even if one has the impres- 
sion that a particular property of one's quantum source 
is (or ought to be) known, it still might pay off to repre- 
sent that property explicitly as a variable parameter (at 
the small cost of increasing K by 1), and let the data 
determine its best value. 



C. Cross modeling 

Suppose one picked a one-parameter model with a 
wrong (nonzero) value of 0, and the AIC has declared 
the model to be worse than the FPM. How can one im- 
prove the model in a systematic way when one lacks a 
good idea of which parameters to add to the model (we 
assume we already incorporated all parameters deemed 
important a priori). Apart from taking more and dif- 
ferent measurements, one could use a hint from the ex- 



isting data. One method making use of the data is to 
apply "cross modeling," where half the data is used to 
construct a modification to the model, and the remain- 
ing half is used for model validation, again by evaluating 
AIC on just that part of the data. So suppose N mea- 
surements generate a data sequence F — {/i, /2, /at}. 
One takes, e.g., the first N/2 data points, {/i, ■■■tJn/2}i 
as the training set, and acquires the MLE state pmle, 
or a numerically feasible approximation thereof, with re- 
spect to the training set. We then create a model with 
two parameters like so: 



P4>i<^^ «) = (1 - e) [(1 - q)PMLE 



target \ 



(18) 



For practical reasons pmle docs not need to be strictly 
the MLE state, in particular when the dimension of the 
full parameter space is large. One would only require it to 
explain {/i, fM/2\ well enough to make sure that part 
of the data is properly incorporated in the model. Thus, 
one could, for example, use one of the numerical shortcuts 
described in [57]. The rest of the data {/Ar/2+1, ■■■tIn} 
is used to evaluate M2<^ against the FPM. 

We note the resemblance of this procedure with the 
method of "cross-validation" [28] . In cross-validation one 
tries to find out how well a given predictive model per- 
forms by partitioning the data set into training set and 
validation set (exactly the same idea as given above). 
One uses multiple different partitions, and the results are 
averaged and optimized over those partitions. It can be 
shown |29) that under certain conditions cross-validation 
and the AIC are asymptotically equivalent in model se- 
lection. This virtually exempts one from having to check 
multiple partitions of the data set, by applying the AIC 
to the whole data set. 

It is worth emphasizing that what we do here is dif- 
ferent in two ways. First, our model is not fixed but 
modified, based on information obtained from one half 
of the data. Second, we partition the data set only once, 
and the reason is, that it would be cheating to calculate 
the (approximate) MLE state of the full set of data (or, 
similarly, check many partitions and average), and then 
consider the resulting MLE state a parameter-free model. 

FIG.[l];b) shows resuhs for and SIC-POVM mea- 
surements. When the number of measurements is = 
1000, all M20 models are considered better than the 
FPM, regardless of the phase error cj) assumed for the 
target state. The reason is that around = 7r/2 the 
approximate MLE state obtained from the first half of 
the data is able to "predict" the measurement outcomes 
(including their large amount of noise!) on the second 
half better than the 1-parameter model with the wrong 
phase. 

On the other hand, when N = 10000 the AIC rec- 
ognizes only the simple models with small phase errors 
[(f) = 0,7r/8,7r/4) as better than the FPM. So, neither 
the approximate MLE state, nor the 1-parameter model 
with wrong phase are performing well. This indicates 
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FIG. 3: How the AIC ranks our one-parameter model vs. the 
FPM for an entanglement witness measurement: The differ- 
ence between AICs of Mi^ with |*?argot(0)) and the FPM, 
i.e., -AAIC = AIC(FPM)- AIC(Mi0), for different numbers 
of witness measurements, as functions of (p. The horizontal 
line demarcates AAIC = 0. 




FIG. 4: Does the witness Wj^^ detect entanglement if there 
is a phase error?: Witness performance {Wj^^} for different 
states (defined as in Eq. (20l) as a function of (p. A negative 



with one single value determining entanglement, we make 
use of the full record of all individual outcomes in or- 
der to evaluate (and then maximize) likelihoods. For 
example, for the measurement of all four spins in the 
x-direction simultaneously, we can count the number 
of times they are projected onto the \x + x + x + x+) 
state, the |x -I- cc -I- a; -I- x—) state, etc. In both x- or y- 
directions, the number of independent observables (i.e., 
the number of independent joint expectation values) is 
15, which can be seen as follows: Any density matrix of 
M qubits can be expressed in terms of the expectation 
values of 4*^ tensor products of the 3 Pauli operators and 
the identity U, but the expectation value of the product of 
M identities equals 1 for any density matrix, thus leading 
to 4^^ — 1 independent parameters encoded in a general 
density matrix. From having measured just on all M 
qubits, we can evaluate all expectation values of all op- 
erators that are tensor products of ax and the identity. 
There are 2*^ such products, and subtracting the triv- 
ial expectation value for 1**^^ leaves 2*^ — 1 independent 
expectation values. 

This means it only takes 2 x 15 = 30 independent pa- 
rameters to form the FPM, and we have K = 30. Simi- 
lar to the tomographically complete case, we do not need 
the concrete form of the whole 255-30 dimensional man- 
ifold of MLE states, nor do we need to explicitly pa- 
rameterize the 30-parameter FPM states, as we can sim- 
ply upper bound the maximum likelihood for this model, 
Lm(FPM), by noting the best one could possibly do is 
reproduce exactly the observed frequencies of all possible 
measurement outcomes. 



expectation value detects entanglement. 



how many measurements are needed to predict a single 
phase to a given precision. 



D. Witness measurement 

For states that are close to symmetric Dicke states 
^jv^^^' their entanglement can be verified by using mea- 
surements that require only two different local settings, 
e.g., spins (or polarizations) either all in the a:-direction 
or all in the y-direction. In particular, when A'^ = 4, an 
efficient witness is Wjxy = 7/2 + ^/3— J^ — Jy [SO], where 

Jx^y — J2j ^xlf^^ with a^Jj the Pauh matrices for the j- 
th subsystem. This witness detects (by having a negative 
expectation value) Dicke states with a white noise back- 
ground, i.e., p{a) = (1 — a) jZ?!) {DI\ + al/D whenever 
< a < 0.1920. 

So we suppose we perform N/2 measurements on all 
of the four spins in the x-direction simultaneously, and 
another A'^/2 similar measurements in the y-direction. 
Instead of calculating the witness Wj^^ and ending up 



E. Estimating entanglement 

Our state Pactuai = pict = 0.2) is just not detected 
by the witness Wjxy, but still contains a considerable 
amount of entanglement. We choose to quantify this en- 
tanglement by means of three entanglement monotones 
(of which only two are independent), simply constructed 
from all bipartite negativities. If the four parties are de- 
noted A, B, C and D, the generalized negativities [3Tti33] 
are defined as 

Afl = WAB-CDMAC-BoMAD^Bcf'^ , (19a) 
^f■2 =iJ^A-BCDJ^B^CDAJ^C-DABJ^D^ABc)^^'^ , (19b) 

No = {NMf\ (19c) 

where Mab-cd denotes the negativity with respect to 
partition AB against CD, etc. The main advantage 
of the generalized negativities is that they are all ef- 
ficiently computable directly from the density matrix. 
We have for our state Wi = 0.6293, A^s = 0.3875, and 
A/'o = 0.4770. 

Similarly to the tomographically complete case, we 



7 




(a)Af = 100 




(b)Af = 1000 

FIG. 5: How one quantifies entanglement from a witness 
measurement: The posterior probability distributions of Ao 
for different numbers of witness measurements from model 
Mi{q) with target state j'l'tiJgct^i where — 0,7r/6,7r/3. 

A/'o(pactuai) = 0.4770. The prior distribution is assumed to 
be uniform on [0,1] for both e and q. The distributions of A/i 
and are similar (up to a simple shift). 



first consider the following one-parameter model: 

Mi^ : p^q) = il-q) l^-LgctW) 



where |vl'?arget(0)) is defined in Eg. (14, 
MiJ, and FPM are 



{^LsoM+QT^/D, (20) 
The AICs for 



AIC(Mi^) 
AIC(FPM) 



2La/(Mi0) + 2, 
2Lm(FPM) + 2 X 30. 



(21) 
(22) 



FIG. [3] shows that, as before, the marks above the hor- 
izontal solid line correspond to models deemed better 
than FPM. Compared to the case of full tomography 
(FIG. [l|^a)), here the value of AlC{Mi^) is larger than 
AIC(FPM) by a much smaller amount, even when the 
phase term is correct {(p — 0). The absolute value of the 
difference is not relevant, though, and what counts is its 
sign. The obvious reason for the smaller difference is that 
the number of independent parameters for the FPM has 
dropped from 255 to 30. In addition, the FPM in this 
case does not refer to a specific 30-parameter model. On 
the contrary, since the number of degrees of freedom of 
the quantum system is still 255, there is a whole subspace 



of states, spanning a number of degrees of freedom equal 
to 225 (=255-30), all satisfying the maximum likelihood 
condition. 

The witness measurement is very sensitive to the phase 
error, even when the number of measurements is still 
small. When N = 1000, the estimation of 4> is within 
an error of 7r/6, as the second point plotted is already 
below the line AAIC = 0. Compared to FIG. [TJa), this 
precision is only reached when N = 10000. 

An interesting comparison can be made between AIC 
and the entanglement-detecting nature of witness Wj^^ . 
FIG. l4| shows the performance of (Wj^^) for the pure 
state 1^ target ('/')) {P<t>il = 0)j solid curve) and the mixed 
with 20% of identity mixed in (/O0(q = 0.2), dot-dashed 
curve). Even when the state is pure, (Wj^^) will not be 
able to witness any entanglement if the phase error is 
larger than 7r/3, just about when AIC declares such a 
model deficient. Entanglement in the mixed p^{q — 0.2) 
of course is never witnessed. This means {Wj^^} is only 
an effective witness in the vicinity of j^*!), with limited 
tolerance of either white noise or phase noise in even just 
one of the four qubits. (Of course, one would detect the 
entanglement in the pure state by appropriately rotating 
the axes in the spin measurement on the first qubit over 
an angle </>.) 

To test whether a few-parameter model correctly quan- 
tifies entanglement if that model is preferred over the 
FPM by AIC, we estimate a (posterior, Bayesian) prob- 
ability distribution over the generalized negativities (de- 
fined above). We see that the first three curves in 
FIG. [5](a) and the first two curves in FIG. ^h), which 
correspond to the data points above the horizontal line in 
FIG. [3] all give consistent estimates of A/q, compared to 
the actual value of Ao for the true state (and the same 
holds for A/i^2 (not shown)). Conversely, the estimate 
cannot be trusted when AIC deems the simple model in- 
ferior to the FPM (of course, it may still happen to be a 
correct estimate, but one could not be sure). This gives 
additional evidence for the success of AIC. 



F. Cross modeling for a witness measurement 

We now construct a two-parameter model simi- 
lar in spirit to that discussed for tomographically com- 
plete measurements: half the data [on which half the 
time (iTa;)®'* is measured, and half the time {<Jy)^'^] are 
used to generate a better model, which is then tested on 
the other half of the data (also containing both types of 
measurements equally). We write 



p{e,q)={l-e) [{l-q)p, 



observation 



+q |*Lgct(0)) (*Lgct(0)|] + el/^- (23) 



To find a Pobsorvation — there are many equivalent ones for 
predicting the outcomes of the witness measurements — 
we recall that a generic four-qubit state can be expressed 
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FIG. 6: What fraction of the model Eq. '23) describes phys- 
ical states?: The lower left part separated by the curves is 
where q) of Eq. ( |23[ ) is unphysical (and so is not actually 
included in the model), for different number of measurements 
iV. 
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FIG. 7: How the AIC ranks our two-parameter model vs. the 
FPM, for a witness measurement: The difference between 
AICs of M24, and the FPM, i.e., -AAIC = AIC(FPM) - 
AIC(M20), for different numbers of witness measurements as 

a function <f). The target state is j^Pj^Jg^tV The horizontal 
line demarcates AAIC = 0. 



as 



definite. But the most attractive property of ^observation 
is that it preserves the measurement outcomes. It is in 
fact the unique pseudostate that reproduces the exact 
frequencies of all measurement outcomes and that has 
vanishing expectation values for all other wnperformed 
collective Pauli measurements. As a component oi p{e,q), 
we allow Pobservation to bc uuphysical, but we only keep 
those p{e, q) that are positive semi-definite. We checked 
numerically for what values of e and q the states end up 
being physical, and how this depends on the number of 
measurements performed. Physical states are located in 
the upper right part of the square in FIG. [6j That is, 
only if e and/or q are sufficiently large, so that a suffi- 
ciently large amount of j^'target) and/or 1/16 has been 
mixed in, does p(e, q) become physical. Depending on 
the number of measurements, the area of the upper right 
part is about 69%-77% of the whole square. The physi- 
cal/unphysical boundary shifts closer to the origin as the 
number of measurements increases. 

We test the two-parameter model (the physical part 
of it), and show the results in FIG. [t] We find that 
for N = 100 the AIC ranks P2<j>{i,q) better than the 
FPM, even when the guess about 4> is very imprecise: 100 
witness measurements are, unsurprisingly, not enough for 
a correct reconstruction of the state. When N = 1000, 
AIC only prefers the models with a value for (j) within 7r/6 
of the correct value. And when N — 10000, the accepted 
values of cj) are even closer to the true value. 

The corresponding posterior distributions of negativi- 
ties J^2 are plotted in FIG. |8] for the three better guesses, 
(j) = 0,7r/6,7r/3. When N = 100 all three give decent 
predictions of J\f2 (and indeed, AIC ranks those models 
highly). For N = 1000 and N = 10000, we would only 
trust the estimates arising from the lower two values of 0, 
or just the correct value of 4>, respectively. This trust is 
rewarded in FIG.jTjb) and FIG.[7|jc), as those estimates 
are indeed correct, within the error bars. In addition, 
the untrusted estimate for = 7r/6 for = 10000 still 
happens to be correct, too. 



P — ^ Cjklm<yj 
jklra 



(24) 



where j,k,l,m = 1,2,3,4 where cri.2,3 denote the Pauli 
matrices Ux,y,z and CT4 — 1. The witness measures the 
coefficients Cjkim where j,k,l,m can be combinations of 
only 1 and 4 or combinations of only 2 and 4 (e.g., C1441 
or 04222)- We label the CjkimS. that can be recovered from 



witness measurement as c\ 



'jklra 



{w as in witness). We do 
not include in c™^;„ the coefficient C4444, which always 
equals 1/16, so that it does not depend on measurement 
outcomes. We define 



^observation 



jklm 



'jklm^j 



1/16. (25) 



Note that ^observation Can be considered as a trace- 
one pseudostate, since it is not necessarily positive semi- 



G. Comparing one- and two-parameter models 
directly 



Finally, the AIC can compare the one- and two- 
parameter models Mi^ and M20 directly. For that pur- 
pose one needs to use the same validation set of data, 
which implies that the two-parameter model needs ad- 
ditional data to generate Pobscrvation- Here we display 
results for just 50 witness measurements, and an addi- 
tional set of 50 measurements for M24,. FIG. [9] shows 
that even such a small number of additional data is use- 
ful if the angle is wrong, and, similarly, it shows that 
the same small number suffices to detect a wrong single- 
qubit phase when it is larger than tt/S. 
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FIG. 8: Quantifying entanglement from a witness measurement: The posterior distributions for ^2, for difTerent numbers of 
measurements, using model M24, with pobsorvation and target state j^'Ligot), where 4> = 0,7r/6,7r/3. A/'2(pactuai) ~ 0.3875. The 
same prior is used as in FIG. [5] Whenever the AIC declares a model superior to the FPM, the estimated entanglement agrees, 
within error bars, with the actual value, but may be wrong otherwise. 
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FIG. 9: Comparing one- and two-parameter models directly: 
The difference between AICs of M20 and Mi^ for 20 different 
sets of witness measurements (A^ — 50) as functions of (p. 
The target state is |*I'targct)- The horizontal line demarcates 
AAIC — 0. The dotted-dashed line is the average of all 20 
points at each different 0. 



IV. CONCLUSIONS 

We applied information criteria, and the Akaike Infor- 
mation Criterion (AIC) developed in Ref. [14] in partic- 
ular, to quantum state estimation. We showed it to be 
a powerful method, provided one has a reasonably good 
idea of what state one's quantum source actually gener- 
ates. 

For each given model, which may include several pa- 
rameters describing error and noise, as well as some 
parameters — call them the ideal-state parameters — de- 
scribing the state one would like to generate in the ideal 
(noiseless and error-free) case, the AIC determines a 
ranking from the observed data. One can construct mul- 
tiple models, for instance, models where some ideal-state 



parameters and some noise parameters are fixed (possibly 
determined by previous experiments in the same setup), 
with others still considered variable. Crucially, the AIC 
also easily ranks the full-parameter model (FPM) , which 
uses in principle all exponentially many parameters in 
the full density matrix, and which is, therefore, the model 
one would use in full-blown quantum state tomography. 
This ranking of the FPM can be accomplished with- 
out actually having to find the maximum-likelihood state 
(or its likelihood) — which quickly would run into insur- 
mountable problems for many-qubit systems — by using 
a straightforward upper bound. 

This way, observed data is used to justify a posteri- 
ori the use of the few-parameter models — namely, if the 
AIC ranks that model above the FPM — and thus our 
method is in the same spirit as several other recent pro- 
posals O [6] to simplify quantum tomography, by tenta- 
tively introducing certain assumptions on the quantum 
state generated, after which data is used to certify those 
assumptions (and if the certification fails, one at least 
knows the initial assumptions were incorrect). 

We illustrated the method on (noisy and mis-specified) 
four-qubit members of the family of Dicke states, and 
demonstrated its effectiveness and efficiency. For in- 
stance, we showed that one can detect mis-specified ideal- 
state parameters and determine noise and error parame- 
ters. We also showed by example the successful applica- 
tion of the method to a specific and useful subtask, that 
of quantifying multi-qubit entanglement. 
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